Exploiting social networks dynamics for P2P 
resource organisation 



Vincenza Carchiolo^, Michele Malgeri^, Giuseppe Mangioni^, and Vincenzo 

Nicosia^ 

Dipartimento di Ingegneria Informatica e delle Telecomunicazioni 
Facolta di Ingegneria - Universita di Catania 
Viale A. Doria 6 - 95100 Catania (ITALY) 
{car ,malgeri .gmangioni , vnicosiajOdiit .unict . it 



Abstract. In this paper we present a formal description of PROS A , 
a P2P resource management system heavily inspired by social networks. 
Social networks have been deeply studied in the last two decades in 
order to understand how communities of people arise and grow. It is a 
widely known result that networks of social relationships usually evolves 
to small-worlds, i.e. networks where nodes are strongly connected to 
neighbours and separated from all other nodes by a small amount of 
hops. This work shows that algorithms implemented into PROS A allow 
to obtain an efficient small-world P2P network. 

1 Introduction 

A Peer-to-Peer system consists of computing elements that are connected 
by a network, addressable in a unique way, and sharing a common com- 
munication protocol. All computing elements, equivalently called nodes 
or peers, have the same functionalities and role. In P2P networks there is 
no difference between "client" and "server" hosts: a peer acts as a "client" 
if it requests a resource from the network, while it acts as a "server" if 
it is requested a resource it is sharing. From this point of view, P2P 
networks differ a lot from World Wide Web, TCP/IP networks and, in 
general, from client- server networks. 

Studies on P2P networks are focused on two different topics: physical 
P2P networks (i.e.,P2P networks opposed to hierarchic and centralised 
TCP/IP networks) and overlay networks (i.e. networks of logical links 
between hosts over an existing physical network of any type). Our interest 
is mainly focused on overlay P2P systems: they are probably going to 
become the most used kind of application-level protocols for resource 
sharing and organisation. 

In this paper we present a novel P2P overlay network, named PROS A , 
heavily inspired by social networks. Social networks are sets of people or 
groups interconnected by means of acquaintance, interaction, friendship 
or collaboration links. Many kinds of natural social networks have been 
deeply studied in the last thirty years [2], and many interesting charac- 
teristics of such networks have been discovered. In a real social network 
relationships among people are of the most importance to guarantee 



efficient collaboration, resources discovery and fast retrieval of remote 
people. Nevertheless, not all relationships in a social network are of the 
same importance: usually links to parents and relatives are stronger than 
links to friends, which are in turn stronger than links to colleagues and 
class mets. On the other hand, it is also interesting to note that usually 
links in a social group evolve in different ways. A large amount of rela- 
tionships arc (and remain) bare "acquaintances" ; some of them evolve 
around time into "friendships", while "relativeness" is typical of very 
strong links to really trusted people. 

This suggests that a P2P network based on a social model should take 
into account that different kind of links among peers can exist, and that 
links can evolve from simple acquaintances to friendship. 
Results of studies performed by Watts, Strogatz, Newman, Barabasi et 
al. in the last decades [7] [4] [6] [8] [1] reveal that networks of movie 
characters, scientific collaborations, food chains, proteins dependence, 
computers, web pages and many other natural networks usually exhibit 
emerging properties, such that of being small-worlds. A small-world is 
a network where distance among nodes grows as a logarithmic function 
of the network size and similar nodes are strongly connected in clusters. 
PROSA trios to build a P2P network based on social relationships, in 
the hope that such network could naturally evolve to a small-world. 
In section 2, we describe PROSA and algorithms involved in linking 
peers and routing queries for resources; in section 3, we report some 
results about topological properties of PROSA network, obtained by 
simulation; in section 4, we summarise obtained results and plan future 
work. 

2 PROSA : A brief introduction 

As stated above, PROSA is a P2P network based on social relation- 
ships. More formally, wc can model PROSA as a directed graph: 

PROSA = {V,C,Pk, Label) (1) 

V denotes the set of peers (i.e. vertices), C is the set of links / = {s,t) 
(i.e. edges), where i is a neighbour of s. For link I = {s,t), s is the source 
peer and t is the target peer. All links arc directed. 
In P2P networks the knowledge of a peer is represented by the resources 
it shares with other peers. In PROSA the mapping Pk : V C, asso- 
ciates peers with resources. For a given peer s G V, Pk (s) is a compact 
description of the peer knowledge (PK - Peer Knowledge). 
Relationships among people arc usually based on similarities in inter- 
ests, culture, hobbies, knowledge and so on. Usually these kind of links 
evolve from simple "acquaintance-links" to what we called "semantic- 
links" . To implement this behaviour three types of links have been intro- 
duced: Acquaintance-Link (AL), Temporary Semantic-Link {TSL) and 
Full Semantic-Link (FSL). TSLs represent relationships based on a par- 
tial knowledge of a peer. They are usually stronger than ALs and weaker 
than FSLs. 



In PROS A , if a given link is a simple AL, it means that the source peer 
does not know anything about the target peer. If the link is a FSL, the 
source peer is aware of the kind of knowledge owned by the taxget peer 
(i.e. it knows the Pk{t) <, where t € P is the target peer). Finally, if the 
link is a TSL, the peer does not know the full Pk(t) of the linked peer; 
it instead has a Temporary Peer Knowledge (TPk) which is built based 
on previously received queries from the source peer. Different meanings 
of links are modelled by means of a labelling function Label: for a given 
link I = {s,t) € L, Label{l) is a vector of two elements [e,w]: the former 
is the link label and the latter is a weight used to model what the source 
peer knows of the target peer; this is computed as follow: 

- if e = AL =^ w = <D 

- if e = TSL =^w = TPk 

- iie = FSL^w = Pk{t) 

In the next two sections, we give a brief description of how PROSA 
works. A detailed description of PROSA can be found in [3]. 

2.1 Peer Joining to PROSA 

The case of a node that wants to join an existing network is similar to 
the birth of a child. At the beginning of his life a child "knows" just 
a couple of people (his parents). A new peer which wants to join, just 
looks for n peers at random and establishes ALs to them. These links 
are ^Ls because a new peer doesn't know anything about its neighbours 
until ho doesn't ask them for resources. This behaviour is quite easy 
to understand: when a baby comes to life he doesn't know anything 
about his parents. The PROSA peer joining procedure is represented 
in algorithm 1. 



Algorithm 1 JOIN: Peer s joining to PROSA {V,C,Pk, Label) 
Require: PROSA {V , C, Pk, Label), Peer s 

1: Ti,V <— rnd{P,n) {Randomly selects n peers of PROSA } 

2: -P <- -P U s {Adds s to set of peers} 

3: C ^ CU {{s,t),yt € TZV} {Links s with the randomly selected peers} 
4: \/t e TZV Label{p, q) ^ [AL, 0] {Sets the previously added hnks as AL} 



2.2 PROSA dynamics 

In order to show how PROSA works, we need to define the structure 
of a query message. Each query message is a quadruple: 

Qm = {qid,q,s,nr) (2) 



where qid is a unique query identifier to ensure that a peer does not 
respond to a query more then once; q is the query, expressed according 



to the used knowledge model^; s G P is the source peer and rir is the 
number of required results. PROSA dynamic behaviour is modelled 
by Algorithm 2 and is strictly related to queries. When a user {U) of 
PROSA asks for a resource on a peer s, the inquired peer s builds up 
a query q and specify a certain number of results he wants to obtain rir- 
This is equivalent to call ExecQuery{PROSA , s, 0, rir). 



Algorithm 2 ExecQuery: query q originating from peer s executed on peer cur 
Require: PROSA {V,C,Pu, Label) 
Require: cur,prev G V, qm G QueryMessage 
1: Result 

2: if prev 7^ then 

3: UpdateLink{PROSA ,cur,prev,q) 
4: end if 

5: {Result, numRes) <^ ResourcesRelevance{PROSA ,q, cur, rir) 

6: if numRes = then 

7: / ^ SelectForwarder{PROSA , cur, q) 

8: if / ^ then 

9: ExecQuery{PROSA , f, cur, qm) 
10: end if 
11: else 

12: SendMessage{s, cur. Result) 

13: £ ^ £ U (s, cur) 

14: Label{s,cur) ^ [FSL,Pk{cur)] 

15: if numRes < rir then 

16: {- Semantic Flooding -} 

17: for all t G Neighborhood{cur) do 

18: rel PeerRelevance{Pk{t),q) 

19: if rel > Threshold then 

20: qm ^ {qid, q, s, rir — numRes) 

21: ExecQuery{PROSA , t, cur, qm) 

22: end if 

23: end for 

24: end if 

25: end if 



The first time ExecQuery is called, prev is equal to and this avoids 
the execution of instruction number 3. Following calls of ExecQuery, i.e. 
when a peer receives a query forwarded by another peer, use function 
UpdateLink, which updates the link between current peer cur and the 
forwarding peer prev, if necessary. If the requesting peer is an unknown 
peer, a new TSL link to that peer is added having as weight a Temporary 
Peer Knowledge(rPi,) based on the received query message. Note that 
a TPk can be considered as a "good hint" for the current peer, in order 

^ If knowledge is modelled by Vector Space Model, for example, g is a state vector of 
stemmed terms. If knowledge is modelled by onthologies, q is an ontological query, 
and so on 



to gain links to other remote peers. It is really probable that the query 
would be finally answered by some other peer and that the requesting 
peer will download all resources that matched it. It would be useful to 
record a link to that peer, just in case that kind of resources would be 
requested in the future by other peers. If the requesting peer is a TSL 
for the peer that receives the query, the corresponding TPV (Temporary 
Peer Vector) in the list is updated. If the requesting peer is a FSL, no 
updates are performed. 

The relevance of a query with respect to the resources hosted by the 
user's peer is evaluated calling function ResourcesRelevance. Two pos- 
sible cases can hold: 

— If none of the hosted resources has a sufficient relevance, the query 
has to be forwarded to another peer /, called "forwarder" . This peer 
is selected among s neighbours by the function SelectForwarder, 
using the following procedure: 

- Peer s computes the relevance between query q and the weight 
of each links connecting itself to his neighbourhood. 

- It selects the link with the highest relevance, if any, and forward 
the query message to it. 

- If the peer has neither FSLs nor TSLs, i.e. it has just ALs, the 
query message is forwarded to one link at random. 

This procedure is described in Algorithm 2, where the subsequent 
forwards arc performed by means of recursive calls to ExecQuery. 

— If the peer hosts resources with sufficient relevance with respect to 
g, two sub-cases are possible: 

- The peer has sufficient relevant documents to full-fill the request. 
In this case a result message is sent to the requesting peer and 
the query is no more forwarded. 

- The peer has a certain number of relevant documents, but they 
are not enough to full-fill the request (i.e. they are < tIt). In this 
case a response message is sent to the requester peer, specify- 
ing the number of matching documents. The message cjuery is 
forwarded to all the links in the neighbourhood whose relevance 
with the query is higher than a given threshold (semantic flood- 
ing). The number of matched resources is subtracted from the 
number of total requested documents before each forward step. 

When the requesting peer receives a response message it presents the 
results to the user. If the user decides to download a certain resource 
from another peer, the requesting peer contacts the peer owning that 
resource asking for download. If download is accepted, the resource is 
sent to the requesting peer. 

3 Topological properties 

Algorithms described in section 2 are inspired by the way social rela- 
tionships among people evolve, in the hope that a network based on 
those simple rules could naturally become a small-world. That of being 
a small-world is one of the most desirable properties of a P2P network, 
since resource retrieval in small-worlds is really efficient. This is mainly 



due to the fact that small-world networks have a short Average Path 
Length (APL) and a high Clustering Coefficient (CC). APL is defined 
as the average number of hops required to reach any other node in the 
network: if APL is small, all nodes of the network can be easily reached 

in a few stops starting from whichever other node. 

CC can be defined in several ways, depending on the kind of "cluster- 
ing" you are referring to. We used the definition given in [7], where the 
clustering coefficient of a node is defined as: 

CCn = (3) 

where n's neighbours are all the peers to which n as linked to, En,reai is 
the number of edges between n's neighbours and En,tot is the maximum 
number of possible edges between n's neighbours. Note that if k is in 
the neighbourhood of n, the vice- versa is not guaranteed, due to the fact 
that links are directed. The clustering coefficient of the whole network is 
defined as: 

i.e. the average clustering coefficient over all nodes. 

The CC is an estimate of how strongly nodes are connected to each 
other and to their neighbourhood. In particular, the definition given in 
Equation 3 measures the percentage of links among a node neighbours 
with respect to the total possible number of links among them. 
In the following two subsections we show that PROSA has both a small 
APL and a considerable high CC. 



3.1 Average path length 

Since we are focusing on topological properties of a PROSA network to 

show that it is a small-world (i.e. that queries in PROSA are answered 
in a small amount of steps), we estimate the APL as the average length 
of the path traversed by a query. It is interesting to compare the APL of 
PROSA with the APL of a correspondent random graph, since random 
graphs usually have a really small average path length. 
Given a graph G{V,E) with \V\ vortices (nodes) and |_E| edges (links) 
among nodes, the correspondent random graph is a graph Gmd which has 
the same number of vertices (nodes) and the same number of edges (links) 
of G, and where each link between two nodes exist with a probability p. 
Note that the APL of a random graph can be calculated using equation 
(5), as reported in [5], where \V\ is the number of vertices (nodes) and 
is the number of edges (links). 

APL- (5) 

^^^-iog(|i/|/|£;|) 

Figure 1 shows the APL for PROSA and the correspondent random 
graph for different number of nodes in the case of 15 performed queries 
per node. The APL for PROSA is about 3.0, for all network sizes, while 



the APL for the correspondent random graph is between 1.75 and 2.0: 
the average distance among peers in PROS A seems to be independent 
from the size of the network. This is quite common in real small-world 
networks. 



Fig. 1. APL for PROS A and random network 



It is also interesting to analyse how APL changes when the total number 

of performed queries increases. Results arc reported in Figure 2, where 
the APL is calculated for windows of 300 queries, with an overlap of 
50 queries. Note that the APL for PROSA decreases with the number 
of performed queries. This behaviour heavily depends on the facts that 
new links among nodes arise whenever a new query is performed (TSLs) 
or successfully answered (FSLs). The higher the number of performed 
queries, the higher the probability that a link between two nodes does 
exist. 




Fig. 2. Running averages of APL for PROSA with different network size 



3.2 Clustering Coefficient 



The clustering (or transitivity) of a network is a measure of how strongly 
nodes are connected to their neighbourhood. Since links among nodes in 
PROS A arc established as a consequence of query forwarding and an- 
swering, we suppose that peers with similar knowledge will be eventually 
linked together. This means that usually peers have a neighbourhood of 
similar peers, and having strong connections with neighbours could really 
speed-up resource retrieval. 

In Figure 3 the CC of PROS A for different number of performed queries 
is reported, for a network of 200 nodes. Note that the clustering coef- 
ficient of the network increases when more queries are performed. This 
means that nodes in PROS A usually forward queries to a small num- 
ber of other peers so that their aggregation level naturally gets stronger 
when more queries are issued. 
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Fig. 3. PROSA CC for PROSA 



It could be interesting to compare PROSA clustering coefficient with 
that of a corresponding random graph. The clustering coefiicient of a 
random graph with \V\ vertices (nodes) and \E\ edges (links) can be 
computed using equation 6. 

^^""^ = \V\ ■ {\V\ - 1) 

Figure 4 shows the CC for PROSA and a correspondent random graph 
for different network sizes, in the case of 15 performed queries per node. 
The CC for PROSA is from 2.5 to 6 times higher that that of a corre- 
spondent random graph, in accordance with CC observed in real small- 
world networks. This result is quite simple to explain, since nodes in 
PROSA are linked principally to similar peers, i.e. to peers that share 
the same kind of resources, while being linked to other peers at ran- 
dom. Due to the linking strategy used in PROSA , it is really probable 
that neighbours of a peer are also linked together, and this increases the 
clustering coefficient. 
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Fig. 4. Clustering coefficient for PROS A and the corr. random grapli 

4 Conclusions and future work 

PROSA is a P2P system mainly inspired by social networks and be- 
haviours. Topological properties of PROSA suggest that it naturally 
evolves to a small-world network, with a very short average path length 
and a high clustering coefficient. More results about query efficiency are 
reported in [3] . Future work includes deeply examining the internal struc- 
ture of PROSA networks and studying the emergence of communities 
of similar peers. 
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